In March 2012, I discovered the Codea app, allowing coding on an iPad, and the Lua programming language, its foundation. That was the unlikely seed for discovering the functional programming language Haskell in late 2013: a Codea discussion mentioned Project Euler and Haskell solutions to its problems. In particular, a one-line solution for Problem 9: the Pythagorean triplet which sums to 1,000.

[(a,b,c) | a <- [1..500], b <- [1..a-1], c <- [1000-a-b], c < 500, a^2 + b^2 == c^2]

1	[(a,b,c) \| a <- [1..500], b <- [1..a-1], c <- [1000-a-b], c < 500, a^2 + b^2 == c^2]

I was attracted by Haskell’s apparently efficient, mathematical syntax. Later, I realised that there were parallels with Excel, the functional ‘programming language’ that I use every day (informed by the paper Improving the world’s most popular functional language: user-defined functions in Excel by Simon Peyton Jones, Margaret Burnett and Alan Blackwell).

Syntax highlighting

I wanted a (free) syntax highlighter plugin for WordPress version 4.9.4 that would work well with Haskell code. Ultimately, I settled on Crayon Syntax Highlighter by Aram Kocharyan.

I ruled out Code Prettify by Kaspars Dambis because it is based on the Google code-prettify library and I understood that library did not handle Haskell well.

module Main where

main :: IO ()
main = do
  let ps = [(a, b, c) | a <- [1 .. 500]
                      , b <- [1 .. a - 1]
                      , c <- [1000 - a - b]
                      , c < 500
                      , a^2 + b^2 == c^2]
  print ps

module Main where

main :: IO ()

main = do

let ps = [(a, b, c) | a <- [1 .. 500]

, b <- [1 .. a - 1]

, c <- [1000 - a - b]

, c < 500

, a^2 + b^2 == c^2]

print ps

Configuring the theme

My favourite code editor is Visual Studio (VS) Code and I use Justin Adam’s Haskell Syntax Highlighting extension with the default dark theme. I wanted the Crayon plugin to highlight Haskell code in the same way as the extension, to the extent that was possible. Syntax highlighting is the product of a language grammar, which names scopes, and a theme, which associates styles with names. VS Code uses TextMate language grammar and themes. Crayon’s approach is simpler.

The first step was to analyse the VS Code language grammar for Haskell, found in the extension’s file haskell.tmLanguage. This is in the form of an XML-format property list, and I used the plist package to identify the names that it contained. The plist package makes use of the hxt package (the Haskell XML Toolbox).

module Main where

import Data.List (foldl', intercalate, nub, sort)
import Text.XML.HXT.Arrow.XmlState.SystemConfig (no, withSubstDTDEntities,
  withValidate)
import Text.XML.Plist (PlObject (PlArray, PlDict, PlString), readPlistFromFile)
import Paths_processTM (getDataFileName)

main :: IO ()
main = do
  uri <- getDataFileUri "haskell.tmLanguage"
  pl <- readPlistFromFile [ withValidate no
                          , withSubstDTDEntities no ] uri
  let scopes = (intercalate "\n" . sort . nub . names []) pl
  writeFile "scopes.txt" scopes

names :: [String] -> PlObject -> [String]
names ns p
  | PlArray ps <- p = foldl' names ns ps
  | PlDict ds  <- p = foldl' names' ns ds
  | otherwise       = ns

names' :: [String] -> (String, PlObject) -> [String]
names' ns ("name", PlString s) = s : ns
names' ns (_, p) = names ns p

getDataFileUri :: String -> IO String
getDataFileUri dataFile = do
  fileName <- getDataFileName dataFile
  return $ uriFromPath fileName

-- |This function assumes that the fileName is an fully qualified one beginning
-- with a disk designator
uriFromPath :: String -> String
uriFromPath fileName = "file:///" ++ normalise fileName

-- |Windows uses both \ and / as folder separators. This function replaces any
-- \ with /.
normalise :: String -> String
normalise = map repl
 where
  repl '\\' = '/'
  repl c    = c

module Main where

import Data.List (foldl', intercalate, nub, sort)

import Text.XML.HXT.Arrow.XmlState.SystemConfig (no, withSubstDTDEntities,

withValidate)

import Text.XML.Plist (PlObject (PlArray, PlDict, PlString), readPlistFromFile)

import Paths_processTM (getDataFileName)

main :: IO ()

main = do

uri <- getDataFileUri "haskell.tmLanguage"

pl <- readPlistFromFile [ withValidate no

, withSubstDTDEntities no ] uri

let scopes = (intercalate "\n" . sort . nub . names []) pl

writeFile "scopes.txt" scopes

names :: [String] -> PlObject -> [String]

names ns p

| PlArray ps <- p = foldl' names ns ps

| PlDict ds <- p = foldl' names' ns ds

| otherwise = ns

names' :: [String] -> (String, PlObject) -> [String]

names' ns ("name", PlString s) = s : ns

names' ns (_, p) = names ns p

getDataFileUri :: String -> IO String

getDataFileUri dataFile = do

fileName <- getDataFileName dataFile

return $ uriFromPath fileName

-- |This function assumes that the fileName is an fully qualified one beginning

-- with a disk designator

uriFromPath :: String -> String

uriFromPath fileName = "file:///" ++ normalise fileName

-- |Windows uses both \ and / as folder separators. This function replaces any

-- \ with /.

normalise :: String -> String

normalise = map repl

where

repl '\\' = '/'

repl c = c

Initially, hxt could not parse the haskell.tmLanguage file. The XML specification forbids < characters inside elements, but the file had three instances that were not converted into < entity references. It appears that VS Code is tolerant of some mis-specified XML files.

The default dark colour scheme used by VS Code is established in .json files dark_defaults, dark_vs and dark_plus. The following scheme colours are used with the Haskell Syntax Highlighting extension:

Colour	Code	Use	Plugin element
Black	#1E1E1E	Background	Not applicable
Green	#608B4E	Comments	COMMENT
Brown	#CE9178	String literals, character literals	STRING
Tan	#D7BA7D	Escaped character literals	Not replicated
Light green	#B5CEA8	Numerical literals	CONSTANT
Pink	#C586C0	Control flow keywords (`do`, `mdo`, `if`, `then`, `else`, `case` and `of`)	STATEMENT
Blue	#569CD6	Compiler pragmas, other keywords, types, `::`, `->`	PREPROCESSOR, RESERVED, TYPE
Light blue	#9CDCFE	Type variables in `data` or `newtype` declarations.	Not replicated
Blue green	#4EC9B0	Classes in `deriving` declarations.	Not replicated
Yellow	#DCDCAA	Variable names in type signatures, exports and imports.	ENTITY
White	#D4D4D4	Default text	IDENTIFIER, OPERATOR, SYMBOL

Configuring the language grammar

The Crayon plugin’s grammar files for a user-defined language are located in its subfolder of wp-content/uploads/crayon-syntax-highlighter/langs. If a user-defined language has the same folder name as a built-in language folder, the user-defined one takes precedence.

The plugin’s language grammar associates a regular expression (regex) with a list of unique elements, user-defined or built-in to the plugin theme. A user-defined element can be associated with a built-in one. The regex language is PHP’s Perl-compatible regex (PCRE).

The grammar specification includes (undocumented) ‘modes’ CASE_INSENSITIVE, MULTI_LINE and SINGLE_LINE, which are all set by default. The modes can be set (ON, YES or 1) or unset (OFF, NO or 0) with lines such as the following (at least one space is required between the mode name and the =):

CASE_INSENSITIVE =OFF

1	CASE_INSENSITIVE =OFF

The grammar for Haskell provided with Crayon was as follows (for those elements which reference the default language grammar, I have added the default as a following comment):

    NAME                Haskell
    VERSION             1.9.8

    COMMENT             (--.*?$)|({-[^\}]*})
    STRING              (?default)
#   STRING              ((?<![^\\]\\)".*?(?<![^\\]\\)")|((?<![^\\]\\)'.*?(?<![^\\]\\)')
    
    QUALIFIER:VARIABLE  (?<=import)\s+[^\s]+
    RESERVED            \b(?alt:reserved.txt)\b
    TYPE                \b(?alt:type.txt)\b
    
    RECORD:VARIABLE     \b\w+\b\s*(?=::)(?=[^{]*})
    ENTITY              \b\w+\b\s*(?=::)
    ARG:VARIABLE        (\b[\w\t ]+\b(?=\s*->))|((?<=->)\s*\b[\w\t ]+\b\s*$)
    CAPS:VARIABLE       (?-i)\b[A-Z]\w+\b(?i)
    IDENTIFIER          (?default)
#   IDENTIFIER          \b[A-Za-z_]\w*\b
    CONSTANT            (?default)
#   CONSTANT            (?<!\w)[0-9][\.\w]*
    OPERATOR            (?default)
#   OPERATOR            (?alt:operator.txt)
    SYMBOL              (?default)
#   SYMBOL              &[^;]+;|(?alt:symbol.txt)

NAME Haskell

VERSION 1.9.8

COMMENT (--.*?$)|({-[^\}]*})

STRING (?default)

# STRING ((?<![^\\]\\)".*?(?<![^\\]\\)")|((?<![^\\]\\)'.*?(?<![^\\]\\)')

QUALIFIER:VARIABLE (?<=import)\s+[^\s]+

RESERVED \b(?alt:reserved.txt)\b

TYPE \b(?alt:type.txt)\b

RECORD:VARIABLE \b\w+\b\s*(?=::)(?=[^{]*})

ENTITY \b\w+\b\s*(?=::)

ARG:VARIABLE (\b[\w\t ]+\b(?=\s*->))|((?<=->)\s*\b[\w\t ]+\b\s*$)

CAPS:VARIABLE (?-i)\b[A-Z]\w+\b(?i)

IDENTIFIER (?default)

# IDENTIFIER \b[A-Za-z_]\w*\b

CONSTANT (?default)

# CONSTANT (?<!\w)[0-9][\.\w]*

OPERATOR (?default)

# OPERATOR (?alt:operator.txt)

SYMBOL (?default)

# SYMBOL &[^;]+;|(?alt:symbol.txt)

reserved.txt listed keywords but also certain functions from Haskell’s Prelude module. type.txt listed certain types from the Prelude. The default list of operators in operator.txt was =&, <<<, >>>, <<, >>, <<=, =>>, !==, !=, ^=, *=, &=, %=, |=, /=, +=, -=, ===, ==, <>, ->, <=, >=, ++, --, &&, ||, ::, # (escaped), +, -, *, /, %, =, &, |, ^, ~, !, <, > and :. The SYMBOL element matched either HTML character entities or XML entity references (not relevant to Haskell) or the default list of characters in symbol.txt (equivalent to the character class [~`!@#$%()_{}[\]|\\:;,.?]), which was partly duplicative of the default operators.

I considered this supplied grammar to be lacking in certain respects, so I replaced it with the following:

CASE_INSENSITIVE = OFF

PRAGMA:PREPROCESSOR  \{-#.*?#-}
COMMENT              \{-.*?-}|---*([^!#$%&*+./<=>?@\\^|\-~:].*?)?$
PREPROCESSOR         ^\s*#.*?$

PACKAGE:IDENTIFIER    (?<=^import\ssafe\squalified\s|^import\squalified\s|^import\ssafe\s|^import\s)\s*("[\d\-A-Za-z]+(\-\d+(\.\d+)*)?")?\s*[A-Z]['\w]*(\.[A-Z]['\w]*)*

STRING               (?<!['\w])'(\\['"]|\S+?)'|".*?(?<!\\)(\\\\)*"

RESERVED             (?<!['\w])(?alt:reserved.txt)(?!['#\w])
STATEMENT            (?<!['\w])(?alt:statement.txt)(?!['#\w])

MODULE:IDENTIFIER    (?<=^module\s|\sas\s)\s*[A-Z]['\w]*(\.[A-Z]['\w]*)*

RESERVED_OP:TYPE     (?<![!#$%&*+./<=>?@\\^|\-~:])(\((,+|->)?\)|\[]|(?<=\()\.\.(?=\))|::|->|=>)(?![!#$%&*+./<=>?@\\^|\-~:])
CONSTRUCTOR:TYPE     (?<![!#$%&*+./<=>?@\\^|\-~:])(`([A-Z]['\w]*\.)*[A-Z]['\w]*#?`|([A-Z]['\w]*\.)*:[!#$%&*+./<=>?@\\^|\-~:]+)
OPERATOR             (?<![!#$%&*+./<=>?@\\^|\-~:])(`([A-Z]['\w]*\.)*[a-z]['\w]*#?`|([A-Z]['\w]*\.)*[!#$%&*+./<=>?@\\^|\-~:]+)

ENTITY               ^[a-z]['\w]*#?\s*(?=::[^!#$%&*+./<=>?@\\^|\-~:])
IDENTIFIER           (?<!['\w])([A-Z]['\w]*\.)*[a-z]['\w]*#?
TYPE                 (?<!['\w])([A-Z]['\w]*\.)*[A-Z]['\w]*#?

CONSTANT             (?<!['\w])(\d+(\.\d+([Ee][+\-]?\d+)?|[Ee][+\-]?\d+)|0([bB][01]+|[oO][0-7]+|[xX][\dA-Fa-f]+(\.[\dA-Fa-f]+([Pp][+\-]?\d+)?|[Pp][+\-]?\d+)?)|\d+)#{0,2}

SYMBOL               [\(),;[\]{}_]

CASE_INSENSITIVE = OFF

PRAGMA:PREPROCESSOR \{-#.*?#-}

COMMENT \{-.*?-}|---*([^!#$%&*+./<=>?@\\^|\-~:].*?)?$

PREPROCESSOR ^\s*#.*?$

PACKAGE:IDENTIFIER (?<=^import\ssafe\squalified\s|^import\squalified\s|^import\ssafe\s|^import\s)\s*("[\d\-A-Za-z]+(\-\d+(\.\d+)*)?")?\s*[A-Z]['\w]*(\.[A-Z]['\w]*)*

STRING (?<!['\w])'(\\['"]|\S+?)'|".*?(?<!\\)(\\\\)*"

RESERVED (?<!['\w])(?alt:reserved.txt)(?!['#\w])

STATEMENT (?<!['\w])(?alt:statement.txt)(?!['#\w])

MODULE:IDENTIFIER (?<=^module\s|\sas\s)\s*[A-Z]['\w]*(\.[A-Z]['\w]*)*

RESERVED_OP:TYPE (?<![!#$%&*+./<=>?@\\^|\-~:])($(,+|->)?$|\[]|(?<=$)\.\.(?=$)|::|->|=>)(?![!#$%&*+./<=>?@\\^|\-~:])

CONSTRUCTOR:TYPE (?<![!#$%&*+./<=>?@\\^|\-~:])(`([A-Z]['\w]*\.)*[A-Z]['\w]*#?`|([A-Z]['\w]*\.)*:[!#$%&*+./<=>?@\\^|\-~:]+)

OPERATOR (?<![!#$%&*+./<=>?@\\^|\-~:])(`([A-Z]['\w]*\.)*[a-z]['\w]*#?`|([A-Z]['\w]*\.)*[!#$%&*+./<=>?@\\^|\-~:]+)

ENTITY ^[a-z]['\w]*#?\s*(?=::[^!#$%&*+./<=>?@\\^|\-~:])

IDENTIFIER (?<!['\w])([A-Z]['\w]*\.)*[a-z]['\w]*#?

TYPE (?<!['\w])([A-Z]['\w]*\.)*[A-Z]['\w]*#?

CONSTANT (?<!['\w])(\d+(\.\d+([Ee][+\-]?\d+)?|[Ee][+\-]?\d+)|0([bB][01]+|[oO][0-7]+|[xX][\dA-Fa-f]+(\.[\dA-Fa-f]+([Pp][+\-]?\d+)?|[Pp][+\-]?\d+)?)|\d+)#{0,2}

SYMBOL [\(),;[\]{}_]

reserved.txt lists only keywords (other than those in statement.txt). I did not want to treat functions in the Prelude differently from other functions. I used new file statement.txt to list the control flow keywords (including \case) and the STATEMENT element to give them a distinct colour, so that:

{-# LANGUAGE LambdaCase #-}

x :: Bool -> Double -> Double
x test value = if test then value else 0.0

y :: Int -> String
y = \case
      0 -> "zero"
      1 -> "one"
      _ -> "lots"

{-# LANGUAGE LambdaCase #-}

x :: Bool -> Double -> Double

x test value = if test then value else 0.0

y :: Int -> String

y = \case

0 -> "zero"

1 -> "one"

_ -> "lots"

became:

{-# LANGUAGE LambdaCase #-}

x :: Bool -> Double -> Double
x test value = if test then value else 0.0

y :: Int -> String
y = \case
      0 -> "zero"
      1 -> "one"
      _ -> "lots"

{-# LANGUAGE LambdaCase #-}

x :: Bool -> Double -> Double

x test value = if test then value else 0.0

y :: Int -> String

y = \case

0 -> "zero"

1 -> "one"

_ -> "lots"

I distinguished Haskell compiler pragmas from other nested comments, so that:

{-# LANGUAGE Haskell2010 #-}
{- This is a comment -}

1 2	{-# LANGUAGE Haskell2010 #-} {- This is a comment -}

became:

{-# LANGUAGE Haskell2010 #-}
{- This is a comment -}

1 2	{-# LANGUAGE Haskell2010 #-} {- This is a comment -}

The ability of regex to detect context is very limited, so it is not possible to nest ‘nested’ comments, but I wanted the detection of the end of a nested comment to be more accurate, so that:

{- Comment out unwanted code
data Point = Point {x :: Int, y :: Int}
origin :: Point
origin = Point 0 0
-}

{- Comment out unwanted code

data Point = Point {x :: Int, y :: Int}

origin :: Point

origin = Point 0 0

became:

{- Comment out unwanted code
data Point = Point {x :: Int, y :: Int}
origin :: Point
origin = Point 0 0
-}

{- Comment out unwanted code

data Point = Point {x :: Int, y :: Int}

origin :: Point

origin = Point 0 0

I wanted (non-nested) comments to be more accurate, so that:

x --> y -- This is a comment

1	x --> y -- This is a comment

became:

x --> y -- This is a comment

1	x --> y -- This is a comment

The supplied grammar did not recognise that Haskell identifiers can include ', a non-word character in PCRE. This meant taking a different approach to character literals, so that:

foldl' :: (b -> a -> b) -> b -> t a -> b
foldl' f z0 xs = foldr f' id xs z0
  where f' x k z = k $! f z x

foldl' :: (b -> a -> b) -> b -> t a -> b

foldl' f z0 xs = foldr f' id xs z0

where f' x k z = k $! f z x

became:

foldl' :: (b -> a -> b) -> b -> t a -> b
foldl' f z0 xs = foldr f' id xs z0
  where f' x k z = k $! f z x

foldl' :: (b -> a -> b) -> b -> t a -> b

foldl' f z0 xs = foldr f' id xs z0

where f' x k z = k $! f z x

The limited ability to detect context meant that I was unable to implement a different colour for escaped character literals. Regex’s lookbehind ((?<= ) or (?<! )) and lookahead ((?= ) or (?! ) facilities provide a basis for detecting some contexts but the lookbehind facility in PCRE requires the look back to be a known number of characters.

Certain keywords are, or can be, followed by the name of a module (module, import, safe, qualified and as). I used regex’s lookbehind to identify such keywords, before matching the module name. In the case of imports, the name of the module may be qualified by the package name. So:

{-# LANGUAGE PackageImports, Safe #-}

module Main (main) where

import qualified "text-1.2.3.0" Data.Text as T (pack)
import "text-1.2.3.0" Data.Text (Text)
import safe Data.Time.Calendar (DayOfWeek (..))

days :: [Int]
days = [1..7]

{-# LANGUAGE PackageImports, Safe #-}

module Main (main) where

import qualified "text-1.2.3.0" Data.Text as T (pack)

import "text-1.2.3.0" Data.Text (Text)

import safe Data.Time.Calendar (DayOfWeek (..))

days :: [Int]

days = [1..7]

became:

{-# LANGUAGE PackageImports, Safe #-}

module Main (main) where

import qualified "text-1.2.3.0" Data.Text as T (pack)
import "text-1.2.3.0" Data.Text (Text)
import safe Data.Time.Calendar (DayOfWeek (..))

days :: [Int]
days = [1..7]

{-# LANGUAGE PackageImports, Safe #-}

module Main (main) where

import qualified "text-1.2.3.0" Data.Text as T (pack)

import "text-1.2.3.0" Data.Text (Text)

import safe Data.Time.Calendar (DayOfWeek (..))

days :: [Int]

days = [1..7]

I did not want to treat types in the Prelude differently from other types. I did not use type.txt but defined a regex for the form of a type. The context-detection limitation meant that data constructors and classes are formatted in the same way as types and I was unable to implement a distinct colour for type variables (formatted like other variables) or classes (formatted like types). For the same reason, -> is formatted in case ... of statements the same way as in type signatures.

The ENTITY element is used for the variable in a type signature. Such a variable is the first thing on a line.

I wanted to treat qualified identifiers the same way as others, so that:

x = Data.List.head xs

1	x = Data.List.head xs

became:

x = Data.List.head xs

1	x = Data.List.head xs

I wanted to implement GHC’s magic hash, so that:

{-# LANGUAGE MagicHash #-}

import GHC.Prim

squareplus4# :: Int# -> Int#
squareplus4# x = x * x + 4# + 0xFF#

{-# LANGUAGE MagicHash #-}

import GHC.Prim

squareplus4# :: Int# -> Int#

squareplus4# x = x * x + 4# + 0xFF#

became:

{-# LANGUAGE MagicHash #-}

import GHC.Prim

squareplus# :: Int# -> Int#
squareplus# x = x * x + 4# + 0xFF#

{-# LANGUAGE MagicHash #-}

import GHC.Prim

squareplus# :: Int# -> Int#

squareplus# x = x * x + 4# + 0xFF#

Rather than use the plugin’s default lists of operators and symbols, I defined regex expressions which reflected the Haskell 2010 Language Report, including infix operators and data constructors, so that:

data Complex a = a :+ a

z :: Complex Double
z = 1.0 :+ 2.0

x :: Int
x = 100 `mod` 13

data Complex a = a :+ a

z :: Complex Double

z = 1.0 :+ 2.0

x :: Int

x = 100 `mod` 13

became:

data Complex a = a :+ a

z :: Complex Double
z = 1.0 :+ 2.0

x :: Int
x = 100 `mod` 13

data Complex a = a :+ a

z :: Complex Double

z = 1.0 :+ 2.0

x :: Int

x = 100 `mod` 13

I wanted the CONSTANT element to match numeric literals more precisely, including hexidecimal floating point literals, so that:

{-# LANGUAGE HexFloatLiterals #-}

module Main where

nums :: [Double]
nums = { 1e-3
       , 0b0101
       , 0o7777
       , 0xF.Fp-4 }

{-# LANGUAGE HexFloatLiterals #-}

module Main where

nums :: [Double]

nums = { 1e-3

, 0b0101

, 0o7777

, 0xF.Fp-4 }

In respect of the regex for the SYMBOL element, when the Crayon plugin processes a regex, it replaces all ( not followed by a ? with (?: except for escaped ( (strictly, all \( sequences). As a consequence, it is necessary to escape ( in regex character classes.

Highlighting Haskell

Syntax highlighting

Configuring the theme

Configuring the language grammar

Rebuilding a Mac mini